Improving Space-Efficiency in Temporal Text-Indexing
نویسندگان
چکیده
Support for temporal text-containment queries is of interest in a number of contexts. In previous papers we have presented two approaches to temporal text-indexing, the V2X and ITTX indexes. In this paper, we first present improvements to the previous techniques. We then perform a study of the space usage of the indexing approaches based on both analytical models and results from indexing temporal text collections. These results show for what kind of document collections the different techniques should be employed. The results also show that regarding space usage, the new ITTX/VIDPI technique proposed in this paper is in most cases superior to V2X, except in the case of patterns of high number of new documents relative to number of updated documents.
منابع مشابه
یک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملLightweight Random Indexing for Polylingual Text Classification
Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, ...
متن کاملGeometric Near-neighbor Access Tree (GNAT) revisited
Geometric Near-neighbor Access Tree (GNAT) is a metric space indexing method based on hierarchical hyperplane partitioning of the space. While GNAT is very efficient in proximity searching, it has a bad reputation of being a memory hog. We show that this is partially based on too coarse analysis, and that the memory requirements can be lowered while at the same time improving the search efficie...
متن کامل